23 research outputs found
A Graph Theoretic Clustering Algorithm based on the Regularity Lemma and Strategies to Exploit Clustering for Prediction
The fact that clustering is perhaps the most used technique for exploratory data analysis is only a semaphore that underlines its fundamental importance. The general problem statement that broadly describes clustering as the identification and classification of patterns into coherent groups also implicitly indicates it\u27s utility in other tasks such as supervised learning. In the past decade and a half there have been two developments that have altered the landscape of research in clustering: One is improved results by the increased use of graph theoretic techniques such as spectral clustering and the other is the study of clustering with respect to its relevance in semi-supervised learning i.e. using unlabeled data for improving prediction accuracies. In this work an attempt is made to make contributions to both these aspects. Thus our contributions are two-fold: First, we identify some general issues with the spectral clustering framework and while working towards a solution, we introduce a new algorithm which we call Regularity Clustering which makes an attempt to harness the power of the Szemeredi Regularity Lemma, a remarkable result from extremal graph theory for the task of clustering. Secondly, we investigate some practical and useful strategies for using clustering unlabeled data in boosting prediction accuracy. For all of these contributions we evaluate our methods against existing ones and also apply these ideas in a number of settings
Discriminative Learning of Similarity and Group Equivariant Representations
One of the most fundamental problems in machine learning is to compare
examples: Given a pair of objects we want to return a value which indicates
degree of (dis)similarity. Similarity is often task specific, and pre-defined
distances can perform poorly, leading to work in metric learning. However,
being able to learn a similarity-sensitive distance function also presupposes
access to a rich, discriminative representation for the objects at hand. In
this dissertation we present contributions towards both ends. In the first part
of the thesis, assuming good representations for the data, we present a
formulation for metric learning that makes a more direct attempt to optimize
for the k-NN accuracy as compared to prior work. We also present extensions of
this formulation to metric learning for kNN regression, asymmetric similarity
learning and discriminative learning of Hamming distance. In the second part,
we consider a situation where we are on a limited computational budget i.e.
optimizing over a space of possible metrics would be infeasible, but access to
a label aware distance metric is still desirable. We present a simple, and
computationally inexpensive approach for estimating a well motivated metric
that relies only on gradient estimates, discussing theoretical and experimental
results. In the final part, we address representational issues, considering
group equivariant convolutional neural networks (GCNNs). Equivariance to
symmetry transformations is explicitly encoded in GCNNs; a classical CNN being
the simplest example. In particular, we present a SO(3)-equivariant neural
network architecture for spherical data, that operates entirely in Fourier
space, while also providing a formalism for the design of fully Fourier neural
networks that are equivariant to the action of any continuous compact group.Comment: PhD thesi
Approximation-Generalization Trade-offs under (Approximate) Group Equivariance
The explicit incorporation of task-specific inductive biases through symmetry
has emerged as a general design precept in the development of high-performance
machine learning models. For example, group equivariant neural networks have
demonstrated impressive performance across various domains and applications
such as protein and drug design. A prevalent intuition about such models is
that the integration of relevant symmetry results in enhanced generalization.
Moreover, it is posited that when the data and/or the model may only exhibit
or symmetry, the optimal or
best-performing model is one where the model symmetry aligns with the data
symmetry. In this paper, we conduct a formal unified investigation of these
intuitions. To begin, we present general quantitative bounds that demonstrate
how models capturing task-specific symmetries lead to improved generalization.
In fact, our results do not require the transformations to be finite or even
form a group and can work with partial or approximate equivariance. Utilizing
this quantification, we examine the more general question of model
mis-specification i.e. when the model symmetries don't align with the data
symmetries. We establish, for a given symmetry group, a quantitative comparison
between the approximate/partial equivariance of the model and that of the data
distribution, precisely connecting model equivariance error and data
equivariance error. Our result delineates conditions under which the model
equivariance error is optimal, thereby yielding the best-performing model for
the given task and data
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Large language models (LLMs) specializing in natural language generation
(NLG) have recently started exhibiting promising capabilities across a variety
of domains. However, gauging the trustworthiness of responses generated by LLMs
remains an open challenge, with limited research on uncertainty quantification
for NLG. Furthermore, existing literature typically assumes white-box access to
language models, which is becoming unrealistic either due to the closed-source
nature of the latest LLMs or due to computational constraints. In this work, we
investigate uncertainty quantification in NLG for LLMs. We
first differentiate two closely-related notions: , which
depends only on the input, and , which additionally
depends on the generated response. We then propose and compare several
confidence/uncertainty metrics, applying them to ,
where unreliable results could either be ignored or yielded for further
assessment. Our findings on several popular LLMs and datasets reveal that a
simple yet effective metric for the average semantic dispersion can be a
reliable predictor of the quality of LLM responses. This study can provide
valuable insights for practitioners on uncertainty management when adopting
LLMs. The code to replicate all our experiments is available at
https://github.com/zlin7/UQ-NLG